An annotated English child language database
نویسندگان
چکیده
The use of large-scale naturalistic data has been opening up new investigative possibilities for language acquisition studies, providing a basis for empirical predictions and for evaluations of alternative acquisition hypotheses. One widely used resource is CHILDES (MacWhinney, 1995) with transcriptions for over 25 languages of interactions involving children, with the English corpora available in raw, part-of-speech tagged, lemmatized and parsed formats (Sagae et al., 2010; Buttery and Korhonen, 2005). With a recent increase in the availability of lexical and psycholinguistic resources and robust natural language processing tools, it is now possible to further enrich childlanguage corpora with additional sources of information. In this paper we describe the English CHILDES Verb Database (ECVD), which extends the original lexical and syntactic annotation of verbs in CHILDES with information about frequency, grammatical relations, semantic classes, and other psycholinguistic and statistical information. In addition, these corpora are organized in a searchable database that allows the retrieval of data according to complex queries that combine different sources of information. This database is also modular and can be straightforwardly extended with additional annotation levels. In what follows, we discuss the tools and resources used for the annotation (§2), and conclude with a discussion of the implications of this initial work along with directions for future research (§3).
منابع مشابه
A large scale annotated child language construction database
Large scale annotated corpora of child language can be of great value in assessing theoretical proposals regarding language acquisition models. For example, they can help determine whether the type and amount of data required by a proposed language acquisition model can actually be found in a naturalistic data sample. To this end, several recent efforts have augmented the CHILDES child language...
متن کاملHigh-accuracy Annotation and Parsing of CHILDES Transcripts
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To d...
متن کاملMorphosyntactic annotation of CHILDES transcripts.
Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of label...
متن کاملVery Large Annotated Database of American English
Object ive To construct a data base (the "Penn Treebank') of written and transcribed spoken American English annotated with detailed grammatical structure. This data base will serve as a national resource, providing training material for a wide variety of approaches to automatic language acquisition, a rei~rence standard for the rigorous evaluation of some components of natural language underst...
متن کاملMobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level
As mobile learning simultaneously employs both handheld computers and mobile telephones and other devices that draw on the same set of functionalities, it throws open the door for swift connection between learners and teachers. This study examined and articulated the impact of the application of mobile devices for teaching English vocabulary items to 123 Iranian semi-illitera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012